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(54) System and method for enhancing the availability of routing systems through equal cost 
multipath 



(57) In a networking environment including one or 
more network processing (NP) devices and implement- 
ing a routing protocol for routing data packets from a 
source NP devices to destination NP devices via a 
switch fabric, with each network processing device sup- 
porting a number of interface ports, a system and meth- 
od for enabling a routing system to recover more quickly 
that the routing protocol so as to significantly reduce the 
occurrence of lost data packets to a failed target inter- 
face/blade. The routing system is enabled to track the 
operational status of each network processor device 
and operational status of destination ports supported by 



each network processor device in the system, and main- 
tains the operational status as a data structure at each 
network processing device. Prior to routing packets, an 
expedient logical determination is made as to the oper- 
ational status of a target network processing device and 
target interface port of a current packet to be routed as 
represented in the data structure maintained at the 
source NP device. If the target blade/interface is not op- 
erations, an alternative route may be provided by ECMP. 
In this manner, correct routing of packets is ensured with 
reduced occurrence of lost data packets due to failed 
target NP devices/ports. 
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software components are responsible for initializing the system, maintaining the forwarding paths, and managing the 
system. From a software view, the system is distributed. The GPP and each picoprocessor run in parallel, with the CP 
communicating with each picoprocessor using a predefined application program interface (API) 30 and control protocol. 
[0009] The CP code base provides support for the Layer 2 and Layer 3 topology protocols and Layer 4 and Layer 5 
5 network applications and systems management. Examples are protocol support for VLAN, IP, and Multiprotocol Label 
Switching standard (MPLS), and the supporting address- and route-learning algorithms to maintain topology informa- 
tion. 

[0010] With particular reference to Figure 1 , and accompanying description found in commonly-owned, co-pending 
U.S. Patent Application Serial No. 09/384,691 filed August 27, 1999 and entitled "NETWORK PROCESSOR 

10 PROCESSING COMPLEX AND METHODS", the whole contents and disclosure of which is incorporated by reference 
as if fully set forth herein, the general flow of a packet or frame received at the NP device is as follows: frames received 
from an network connection, e.g., Ethernet MAC, are placed in internal data store buffers by an upside "enqueue" 
device (EDS-UP) where they are identified as either normal data frames or system control frames (Guided Frames). 
In the context of the invention, frames identified as normal data frames are enqueued to an Embedded Processor 

15 Complex (EPC) which comprises a plurality of picoprocessors, e.g., protocol processors. These picoprocessors exe- 
cute logic (picocode) capable of looking at the received frame header and deciding what to do with the frame (forwardly, 
modify, filter, etc.). The EPC has access to several lookup tables, and classification hardware assists to allow the 
picoprocessors to keep up with the high-bandwidth requirements of the Network Processor. A classification hardware 
assist device in particular, is provided for classifying frames of well known frame formats. The Embedded Processing 

20 Complex (EPC) particularly provides and controls the programmability of the NP device and includes, among other 
components (such as memory, dispatcher, interfaces), N processing units, referred to as GxH, which concurrently 
execute picocode that is stored in a common instruction memory. It is understood, however, that the architecture and 
structure is completely scalable towards more GxHs with the only limitation being the amount of silicon area provided 
in the chip. In operation, classification results from the classification hardware assist device are passed to the GxH, 

25 during frame dispatch. Each GxH preferably includes a Processing Unit core (CLP) which comprises, e.g., a 3-stage 
pipeline, general purpose registers and an ALU. Several GxHs in particular, are defined as General Data Handlers 
(GDH) each of which comprise a full CLP with the five coprocessors and are primarily used for forwarding frames. One 
GxH coprocessor, in particular, a Tree Search Engine Coprocessor (TSE) functions to access all tables, counters, and 
other data in a control memory that are needed by the picocode in performing tree searches used in forwarding data 

30 packets, thus freeing a protocol processor to continue execution. The TSE is particularly implemented for storing and 
retrieving information in various processing contexts, e.g., determining frame routing rules, lookup of frame forwarding 
information and, in some cases, frame alteration information. 

[0011] Traditional frame routing capability provided in network processor devices typically utilize a network routing 
table having entries which provide a single next hop for each table entry. Commonly-owned, co-pending United States 
35 Patent Application Serial No. 09/546,702 entitled METHOD FOR PROVIDING EQUAL COST MULTIPATH FORWARD- 
ING IN A NETWORK PROCESSOR, the whole content and disclosure of which is set forth herein, describes a system 
and method for providing the ability for a network processor to select from multiple next hop options for a single for- 
warding entry. 

[0012] Figure 2(a) depicts an example network processor frame routing scenario 40 and Figure 2(b) illustrates an 

<o example Equal Cost Multipath Forwarding (ECMP) table 50 that may be used to provide a lookup of a nextHop address 
for forwarding packets as described in commonly-owned, co-pending United States Patent Application Serial No. 
09/546,702. Preferably, such a table is employed in a Network Processor (NP) device having packet routing functions 
such as described in commonly-owned, co- pending U.S. Patent Serial Application 09/384,691. 
[001 3] Thus, the example ECMP forwarding table 50 illustrated in Figure 2(b), is particularly implemented in a frame 

15 forwarding context for network processor operations. In the example ECMP forwarding table 50, there is provided 
subnet destination address fields 52, with each forwarding entry including multiple next hop routing information com- 
prising multiple next hop address fields, e.g., fields 60a - 60c. Additionally provided in the ECMP routing table is cu- 
mulative probability data for each corresponding next hop such as depicted in action data field 70. Particularly, in the 
exemplary illustration of the ECMP packet forwarding table 50 of Figure 2(b), there is included three (3) next hop fields 

50 to addresses 9.1.1.1, 8.1.1.1, 6.1.1.1 associated with a destination subnet address 7.*.*.*. An action data field 70 
includes threshold values used to weight the probability of each next hop and is used to determine which next hop will 
be chosen. In the action field 72, shown in Figure 2(b), these values as being stored as cumulative percentages with 
the first cumulative percentage (30%) corresponding to next hop 0, the second cumulative percentage value (80%) 
corresponding to next hop 1, etc. This means that, the likelihood of routing a packet through next hop 0 is 30% (i.e., 

55 approximately 30% of traffic for the specified table entry should be routed to next hop 0), and, the likelihood of routing 
a packet through next hop 1 is 50% (i.e., approximately 50% of traffic for the specified table entry should be routed to 
next hop 1 ). This technique may be extended to offer as many next hops as desired or feasible. 
[0014] Currently, in such network processing systems, if a destination NP device (hereinafter referred to as Target- 
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If the rth bit is set, for instance, then the ith NP/blade is indicated as operational. 
In operation, after choosing the next hop according to ECMP rules, the layer-3 

forwarding picocode will check the operational status of the NP/blade through which the chosen next hop is reachable. 
If that NP is not operational, then a different equal-cost next hop (the next hop with the smallest index) that is reachable 

5 through an operational NP/blade will be chosen. Figure 3 illustrates the determination of a failed link to Target Blade 
associated with ECMP next hop destination NP1, and the resulting decision to re-route the frame to an operation 
destination NP2 according to the ECMP table. That is, in each NP, the operational status of the TB for each packet 
routed is checked. If the destination TB is down, then a different Next Hop is chosen as suggested by the ECMP table. 
It should be understood that the particular user application will detect failures and update the opStatus data structure 

10 accordingly. 

[0021] This first solution essentially maintains the operational status at the TB (biade)/NP level. In order to extend 
this solution to an interface/port (TB/TP) level, there needs to be maintained a datastructure that is 64x16 bits long, 
assuming each blade in the example system maintains sixteen (16) ports, for instance. Since the opStatus datastructure 
is consulted in the main forwarding path, it must be stored in a fast, expensive memory. 
15 [0022] Another solution relies on the assumption that the interface/blade failures are rare and it is unlikely that more 
than one blade will fail at the same time. The advantage of tracking a single failure is the reduction of the size of the 
opStatus data structure. The current solution only requires 48 bits in expensive high-speed memory where as the 
previous solution required 64 x 16 bits in such a memory. Thus, the following data structure may be maintained in each 
NP device in the routing system. 

20 

I 

Uint 16 failedBlade; /* Use the value of Oxffff if all blades are operational */ 
25 Uint 16 failedPortMask; 

Uint 16 failedPortValue; 

} 

30 

[0023] According to this embodiment, the following algorithm is invoked to check whether a given TB, TP is opera- 
tional: 



Boolean is Operational (TB, TP) { 
If (failedBlade = Oxffff) 

/* all blades are operational */ 

return TRUE; 

If ((TB == failedBlade) && (TP & failedPortMask = failed PortValue)) 
/* where && is the logical AND operator */ 

/* where & is a bitwise AND operator*/ 
Return FALSE; 

Else 

Return TRUE; 

} 

[0024] According to this algorithm, if all blades are operational, the routing of packets throughout the system will 
continue and no ECMP re-routing is necessary. However, only if both a Target Blade is a failed blade AND the result 
of the bitwise operation between the Target Port and failedPort Mask is equal to the failedPortValue, then a FALSE is 
returned and the ECMP table invoked for re-routing subsequent packets to another TB or TP. If a TRUE is returned, 
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If the blade numbered bladeNum is not operational (i.e., all the ports in that blade have failed) then, according to this 
algorithm, 

beginFailedBlade and endFailedBlade are set as bladeNum, 

failedPortMask is set as 0, and 
5 failedPortValue is set as 0. 

[0030] However, if the blades numbered, for example 8, 9, and 10 are not operational then set 

beginFailedBlade as 8 

endFailedBlade as 10 

failedPortMask as 0 and 
10 failedPortValue as 0 

[0031] If the port numbered portNum in the blade numbered bladeNum is not operational, then, according to this 
algorithm, 

beginFailedBlade is set as bladeNum 

endFailedBlade is set as bladeNum 
15 failedPortMask is set as Oxff 

failedPortValue is set as portNum 
[0032] The ports in DMU A have last (least significant) 2 bits set to 00. The ports in DMU B have last 2 bits set to 
01. The ports in DMU C have last 2 bits set to 10 and the ports in DMU D have last 2 bits set to 11. In an example 
scenario when all the ports in DMU C fail in blade numbered bladeNum, then, according to this algorithm, 
20 beginFailedBlade is set as bladeNum 

endFailedBlade is set as bladeNum 

failedPortValue is set as 0b 0000 0010 and 

failedPortMask is set as 0b 0000 0011 
[0033] While the invention has been particularly shown and described with respect to illustrative and preformed 
25 embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in form and 
details may be made therein without departing from the spirit and scope of the invention which should be limited only 
by the scope of the appended claims. 



30 Claims 

1 . In a networking environment comprising one or more network processing (NP) devices for routing data packets 
from a source to a destination via a switch fabric, with each network processing device supporting a number of 
interface ports, a system for ensuring packet routing from one network processing device to a target network 

35 processing device via a target interface port, said system comprising: 

mechanism for tracking operational status of each network processor device and operational status of desti- 
nation ports supported by each said network processor device in said system, said operational status being 
maintained at each network processing device; 

said network processor devices including mechanism for determining the operational status of a target network 
processing device and target interface port of a current packet to be routed prior to said routing, 
routing mechanism for routing packets from source NP devices to destination NP devices and destination 
ports thereof in accordance with a packet routing protocol, said routing mechanism routing said current packet 
to a target network processor device and destination port when said target network processor device and 
destination ports thereof are determined as operational, and routing packets to another operational NP device 
and port thereof upon determination of non-operational target network processor device and destination port, 
whereby proper routing of packets is guaranteed with minimum packet lost. 

2. The system for ensuring packet routing in accordance with Claim 1 , wherein said routing mechanism implements 
so an Equal Cost Multi-Path ECMP protocol including next hop routing table for mapping a destination address as- 
sociated with a packet to be forwarded to one or more next hop options in said networking environment. 

3. The system for ensuring packet routing in accordance with Claim 1 , wherein each network processor device main- 
tains a data structure receiving values from said tracking mechanism indicating status of said network processor 

55 devices, said determining mechanism implementing logic for comparing said received value against a first value 

indicating all NP devices are operational prior to routing of a current packet, and initiating routing of said packet 
to said target when said values match. 
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13. The method for ensuring packet routing in accordance with Claim 10, wherein said determining step b) includes 
the step of implementing logic for comparing said received value against a first value indicating all interface ports 
for said NP devices are operational prior to routing of a current packet, and initiating routing of said packet to said 
NP device and target port when said values match. 

14. The method for ensuring packet routing in accordance with Claim 13, wherein said first value includes a set of 
mask bits and a set of bits representing said target destination port, said determining step b) including the step of 
implementing bitwise logic for comparing said received value against said mask bit set and obtaining a first result, 
comparing said first result against said target destination port bits, and initiating re-routing of said packet to another 
destination port when said first result does not match said target destination port bits. 

15. The method for ensuring packet routing in accordance with Claim 13, wherein said data structure receives two 
values defining a range of NP devices that are not operational, said determining step b) implementing logic for 
comparing a bit representation of a target NP device of a packet to be routed against said each of said two values 
defining said range, and initiating re-routing of said packet to another destination port outside said range when 
said bit representation of said target NP device falls within said two values. 

16. A program storage device readable by a machine, tangibly embodying a program of instructions executable by 
the machine to perform method steps for ensuring packet routing in a networking environment comprising one or 
more network processing (NP) devices for routing data packets from a source to a destination via a switch fabric, 
with each network processing device supporting a number of interface ports, said method steps comprising: 

a) tracking operational status of each network processor device and operational status of destination ports 
supported by each said network processor device in said system, and maintaining said operational status at 
each network processing device; 

b) determining the operational status of a target network processing device and target interface port of a current 
packet to be routed prior to said routing at a current NP device; and, 

c) routing packets from source NP devices to destination NP devices and destination ports thereof in accord- 
ance with a packet routing protocol, a current packet being routed to a target network processor device and 
destination port when said target network processor device and destination ports thereof are determined as 
operational, or being routed to another operational NP device and port thereof upon determination of non- 
operational target network processor device and destination port, whereby proper routing of packets is guar- 
anteed with minimum packet lost. 

17. The program storage device readable by a machine in accordance with Claim 16, wherein said routing of packets 
from source NP devices to destination NP devices and destination ports thereof is in accordance with Equal Cost 
Multi-Path (ECMP) protocol, said routing step c) including mapping a destination address associated with a packet 
to be forwarded to one or more next hop options in said networking environment. 

1 8. The program storage device readable by a machine in accordance with Claim 16, wherein said step of maintaining 
said operational status includes maintaining a data structure for receiving values determined from said tracking 
step indicating status of said network processor devices. 

19. The program storage device readable by a machine in accordance with Claim 18, wherein said determining step 
b) includes the step of implementing logic for comparing a received value against a first value indicating all NP 
devices are operational prior to routing of a current packet, and initiating routing of said packet to said target when 
said values match. 

20. The program storage device readable by a machine in accordance with Claim 19, wherein said received value is 
a second value representing a particular NP device that is not operational, said determining step b) including the 
step of implementing logic for comparing a bit representation of a target NP device of a packet to be routed against 
this received second value and initiating routing of said packet to another NP device when said target NP device 
is not operational. 

21. The program storage device readable by a machine in accordance with Claim 20, wherein said determining step 
b) includes the step of implementing logic for comparing said received value against a first value indicating all 
interface ports for said NP devices are operational prior to routing of a current packet, and initiating routing of said 
packet to said NP device and target port when said values match. 
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